Application of a deep learning image classifier for identification of Amazonian fishes

Abstract Given the sharp increase in agricultural and infrastructure development and the paucity of widespread data available to support conservation management decisions, a more rapid and accurate tool for identifying fish fauna in the world's largest freshwater ecosystem, the Amazon, is needed. Current strategies for identification of freshwater fishes require high levels of training and taxonomic expertise for morphological identification or genetic testing for species recognition at a molecular level. To overcome these challenges, we built an image masking model (U‐Net) and a convolutional neural net (CNN) to classify Amazonian fish in photographs. Fish used to generate training data were collected and photographed in tributaries in seasonally flooded forests of the upper Morona River valley in Loreto, Peru in 2018 and 2019. Species identifications in the training images (n = 3068) were verified by expert ichthyologists. These images were supplemented with photographs taken of additional Amazonian fish specimens housed in the ichthyological collection of the Smithsonian's National Museum of Natural History. We generated a CNN model that identified 33 genera of fishes with a mean accuracy of 97.9%. Wider availability of accurate freshwater fish image recognition tools, such as the one described here, will enable fishermen, local communities, and citizen scientists to more effectively participate in collecting and sharing data from their territories to inform policy and management decisions that impact them directly.


| INTRODUC TI ON
The Amazon basin is home to over 2700 species of freshwater fishes (Dagosta & De Pinna, 2019;Junk et al., 2007), many of which are of conservation concern (Albert et al., 2011;García-Dávila et al., 2018;Pelicice et al., 2021). Freshwater fishes provide one of the few reliable sources of protein for Amazonian communities and represent an important economic opportunity through the aquarium trade (Coomes et al., 2010;Moreau & Coomes, 2007). This unique ichthyofauna is facing unprecedented threats, such as deforestation (Junk et al., 2007;Lobón-Cerviá et al., 2015), construction of hydropower dams (Winemiller et al., 2016), mining (Azevedo-Santos et al., 2021), climate change (Bodmer et al., 2017), and in some cases, over exploitation (Moreau & Coomes, 2007). While advances in sampling poorly explored areas and describing the diversity of Amazonian fish have been made over the last decade (e.g., Alofs et al., 2014;de Santana et al., 2019de Santana et al., , 2021, the sub-drainages of the Marañón river remain among the most under sampled regions in South America (Jézéquel et al., 2020). Freshwater fishes provide one of the few reliable sources of protein for Amazonian communities (Coomes et al., 2010;Moreau & Coomes, 2007). In less populated areas of the Amazon, subsistence fishing, for both consumption and the pet trade, can be essential to sustaining life (Coomes et al., 2010;Moreau & Coomes, 2007). Due to the urgency of these economic and ecological threats, efficient data collection and long-term monitoring are needed to better inform mitigation strategies and policy.
Traditional ichthyological sampling methods include focused netting and fishing efforts, followed by extensive manual sorting, documentation, and identification. Although effective, and necessary in the Amazon where a countless number of fishes remain to be described (Reis et al., 2016), these methods are time consuming and raise the potential for misidentification bias (Kirsch et al., 2018).
As a result, many have turned to the assistance of community scientists to aid in catch effort and identification of individual landings, yet accurate species identification remains a challenge (Gardiner et al., 2012;Swanson et al., 2016). Genetic approaches have also been implemented to identify many of the fish species inhabiting the Amazon (de Santana et al., 2021;García-Dávila et al., 2017), but these approaches also rely on well-identified and vouchered genetic libraries that are still missing for Amazonian fishes. These techniques require expensive storage and sample processing technology, which are not readily available in most institutions within the Amazon (de Santana et al., 2021). In order to address the ever-growing need for data and cost-effective solutions, contemporary fisheries research has called for the development and application of a rapid solution, namely by way of machine learning models, such as Convolutional Neural Networks (CNNs, e.g., Perdigão et al., 2020). CNNs have the potential to enable rapid identification of fish to monitor fishery stocks, diversity, bycatch, and to combat illegal fishing (Marini et al., 2018;Perdigão et al., 2020).
Machine learning techniques have been successfully implemented in niche modeling, prediction of mass mortality events, and the development of non-linear ecological time-series models (Crisci et al., 2012;Miller-Coleman et al., 2012;Recknagel, 2001). Image classification deep learning models show promise in being applied to highly diverse taxa and collections (Borowiec et al., 2021;Norouzzadeh et al., 2018;Sullivan et al., 2018;Wäldchen & Mäder, 2018;Schuettpelz et al., 2017;Weinstein, 2017). Past attempts to identify fish taxa using computer vision have had varying degrees of success across a wide breadth of ichthyological data sets. For example, early attempts by Alsmadi et al. (2010) were able to identify 20 families of marine fish from 610 images with an accuracy of 84%. More recent work improved accuracy to 90% (Alsmadi et al., 2019). Hernández-Serna and Jiménez-Segura (2014) used seven museum collections that included both marine and Amazonian freshwater fish (images per collection ranged from 422 to 2392) and obtained accuracies between 72% and 92%. Sun et al. (2016) obtained a species identification accuracy of 77.27% from 9160 AUV images of fish. A study by Qin et al. (2015) was able to identify 23 deep sea fish species with an accuracy of 98% using a substantial number of training images (n = 22,370).
In this study, we developed two deep learning computer vision models: one that segments fish pixels from background pixels, and one that classifies images of Amazonian fishes to the genus level. As the first image classifier for ichthyological monitoring in the megadiverse Peruvian Amazon basin, we hope this case study will act as a primer for further development of deep learning models, as tools for conservation stakeholders. Deep learning for taxonomic image classification has proven to be efficient and highly accurate, demonstrating promise for improving participatory monitoring initiatives (Norouzzadeh et al., 2018;Sullivan et al., 2018). Specifically, these tools will enable communities involved in participatory monitoring to fill knowledge gaps and improve data reliability. These models can also provide a basis on which to build new models for other species of conservation concern and public health interest. Our data and pipeline are publicly available, which will enable others to apply these techniques to other taxa.

| ME THODS
In July 2018, we sampled freshwater fishes in small white-water riv- A total of 141 fish species belonging to 89 genera and 29 families across all sites and seasons were identified (M. Ruiz-Tafur, unpublished data). Captured fish (n = 1967) were placed on a 1 cm grid or a neutral background (leaves, hands, ground, etc.) and photographed using a Nikon D3500 camera, prior to preservation. Specimens were deposited in the ichthyology collection at the Instituto de Investigaciones de la Amazonia Peruana (IIAP) in Iquitos, Peru. Due to the limited number of images we had per species, we restricted our analysis to genera (n = 33), using a minimum threshold of 20 field images per genus (n = 1615). To supplement field images, we incorporated additional images (n = 1453) taken of specimens housed at the Smithsonian National Museum of Natural History Department of Vertebrate Zoology, Division of Fishes collection (USNM) using both a Nikon B500 and W100. Fish specimens were photographed on both blank and 1 cm grid backgrounds from multiple angles. In total, our dataset consists of 3068 images prior to processing.

| Preprocessing steps
To build a training dataset, we first removed all incidentally taken/ non-fish and unidentified fish images. We then built a U-Net (Ronneberger et al., 2015) segmentation model to classify pixels in images as fish or background using the methods similar to White et al. (2020). Specifically, we manually masked a subset of images (n = 66; 2 images from each genus), using the methods of White et al. (2020), to use as a training set to build a U-Net. Our generated masks zeroed out (blacked) background pixels, while retaining fish pixels. The model was built on a resnet-34 architecture pretrained on the ImageNet dataset (Deng et al., 2009). All field and museum images were then masked by our trained U-Net. Images which were unsuccessfully masked, where no component of the original input image remained within the photo, were removed from the dataset.
The remaining images, which had at least some component of the target object with no background, were then subdivided for training and validation of the genus identification model.

| Identification model architecture, training, and validation
We trained our image classifier to distinguish between 33 fish genera based on masked images. The classifier was developed using a Nvidia GeForce (V100; 32GB VRAM) GPU implementing the Fast.ai library (Howard & Gugger, 2020) in PyTorch (Paszke et al., 2019). The model was built on a resnet-101 architecture pretrained on the ImageNet dataset (Deng et al., 2009). To develop our image classifier model, masked images were randomly divided into training (n = 2387) and validation (n = 596) sets, split 80/20 respectively, to maximize accuracy (Hernández-Serna & Jiménez-Segura, 2014). All images were resized by 'squishing' them into 300 × 300 pixels. We trained our model over

| DISCUSS ION
We were able to efficiently build a state-of-the-art model which can rapidly identify standardized Amazonian fish images to the genus level (n = 33) with 97.99% accuracy, in line with the results of other deep learning fish studies implementing image classifiers (Alsmadi et al., 2019;Qin et al., 2015). Of the 12 incorrectly classified images in our validation set, 7 were misclassified outside of their family, while 2 images were misclassified outside of their order. Results demonstrate the importance of image quality, image quantity, and taxonomic specificity to generating image classification models that will prove useful for identifying diverse taxa in remote geographic settings.
In this study, we attempted to control image quality by using masking as a way to standardize images. After visually examining the incorrectly classified images, it was evident that some of them were likely more difficult to classify because of bisection from incidentally F I G U R E 1 Example of unmasked (left) and masked (right) images of a fish (Bario steindachneri). masked fish pixels. In short, we believe our masking rendered a few of our images unidentifiable and is arguably an artifact of the data pipeline rather than a source of true error on the image classifier.
One way to improve the final classification accuracy is to capture multiple clear images of individual fish to ensure at least one is successfully masked prior to inference for identification. While the original images used in the study were taken at high resolution of varying sizes, they were ultimately resized to just 300 × 300 pixels.
The rapid advancement of mobile phone photography (Rasmusson et al., 2004) and availability of mobile phones with cameras capable of capturing images even higher in resolution that those used here (González & Pozo, 2019) will contribute to the ever-growing quantity of high-quality image data available to enable generation of even more robust and more accurate models. Standardized protocols for collecting photographic data in both field and museum settings will be key to bolstering future modeling efforts.
The remoteness of the localities sampled as part of this study and the cryptic nature of the species endemic to these sites significantly limited the number of images, we were able to acquire from the field.
We combatted the limitation of image quantity by photographing museum specimens available to us. Utilizing a hybrid approach-a combination of field images and digitized museum collection specimens-we were able to double the amount of data available to generate the model. Combining both museum and field collected images to generate a classification model can enable novel insights that may not have been found by building separate museum and field models (Lendemer et al., 2020). The use of multiple data sources, and willingness to make these publicly available, will provide a robust Most misidentifications in our model involved tetras, small characids that are the dominant fish fauna in Amazonian small rivers and streams (de Oliveira et al., 2009). Historically, species-rich and closely-related tetras have been difficult to identify due to cryptic species diversity -where more than one nominal species may be several undescribed species -and the lack of exclusive morphological characters to identify some genera (e.g., Astyanax > 170 species and Hyphessobrycon > 130 species; Barreto et al., 2017;Escobar-Camacho et al., 2015;Oliveira et al., 2011). In addition, an estimated 40% of species in the region have yet to be described (e.g., Reis et al., 2016). Thus, species misidentifications due to taxonomically complex groups, such as tetras and other cryptic assemblages, are common problems in manual morphological as well as with genetic identification approaches (e.g., de Santana et al., 2021) and this must be considered when building an image classifier for Amazonian fishes. In short, the output given by an image classification model is only as good as the label given to each class during training. If the target class is not well defined, as it may be in the case of tetras, this may disrupt the classification accuracy of the classification for those genera.
Collection of accurate, reliable biodiversity data is vital for monitoring ecosystem health and co-benefits for human well-being. The TA B L E 1 Summary of validation set (n = 596) results by genus.  (Newman et al., 2012). Although previous efforts have applied image classification to citizen science data (Van Horn et al., 2018), none have targeted freshwater fish in such highly biologically and culturally diverse sites as the upper Morona River valley. Given the importance of fish as key indicators of water quality and ecosystem health (Harris, 1995), as well as the dependence of many indigenous Amazonian communities on fish as a reliable source of protein (Swierk & Madigosky, 2014), there is great need for tools that increase the accessibility of taxonomic identification required for accurate monitoring of fishes (Gardiner et al., 2012;Newman et al., 2012). When deployed in the field, our model will empower community-led initiatives to monitor fish in the Amazon River basin to collect more accurate information and identify ecological trends about this integral source of food and income (Finer et al., 2008).
While the model presented here is accurate at identifying fish to the genus level, we expect this to be a first step toward increased digitization and image generation to support training a model at the species level. resources (Gardiner et al., 2012;Newman et al., 2012). In such an incredibly diverse ecosystem, a model accurately identifying fish to the genus level is a first step which will provide motivation for increased digitization efforts to obtain sufficient images for training a model at the species level.

| CON CLUS IONS
We present an application that can be used to rapidly and accurately classify freshwater fish from the upper Morona River valley in the northwest Amazon to genus for scientific research. Although able to classify 33 genera present in the current study area, the model described here provides a solid foundation for future projects. The application, which can be used to classify single images to genus, is accessible to the community online. The model's application to images taken from geographic areas outside of the northwestern Amazon has yet to be explored.

ACK N OWLED G M ENTS
The authors thank the people of the Achuar native community of Brasilia for access to their territory and their interest and contribution to the project. We thank the Indigenous Socio-Environmental